Background

IMDb (Internet Movie Database) is one of the world’s most popular sources for movie, TV show, video game, and celebrity content. It contains information such as cast members, plot summaries, gross earnings, critic reviews and much more. For any media fan, IMDb is a great source to research your favorite entertainment listings.

This report displays information such as gross earnings, ratings, votes, and runtime for various movies, and highlights differences between popular genres. Its goal is to suggest that some genres are more popular than others, and is reflected in various movie characteristis such as gross earnings and ratings.

Visualization and Interpretations

Wordcloud

The word cloud images were generated from the titles of movies in each genre. Titles were splited into words. Then, stopwords and rare characters were removed using regular expression pattern matching techniques. The frequencies of words were fed to wordcloud2 function to generate each plot.

Action Movies: Words associated with actions actually occur more as expected(boxer, attack, breaking, broken, kickboxer). Some Japanese words appear (gekij, jigoku, kenju), which is interesting.

Animation Movies: Gekijouban and gekij–special movie versions for TV series–appear a lot more than other words. Japanese animations are well-known and well rated. Not Surprised.

Biography Movies: Person names and events occur more than others. For example, we see Elizabeth, Alexander, Frank, and Auschwitz which refers to the concentration camp in Poland.

Comedy movies: words such as like, jack, lucky are associated with commedy, but ninja is a strange word here.


Crime Movies: Besides darkness, darkest, Japaneses words like jigok, kenju; and places including Brooklyn and ginza(one of top shopping district in Tokyo) also appear in titles a lot.


Drama Movies: The word usage is rather common. No particular patterns can be pointed out.

Fantacy Movies: Godzilla, Frankenstein, ninja occur more often than others.

Histroy Movies: Similar to Biography, there are some overlaps: auschwitz, elizabeth, alexander, conquest. The reason could be that many movies are categorized into multiple genres.

Horror Movies:

Romance movies:

Sci-Fi:

War Movies: Bunker, Auschwitz, flashback are related to wars.

Overall, most of the results given by the wordcloud met our expectations, but there are rare words that appear frequently and some very common word that appear in many genres. The visualizations are interesting, but they only provide infomation at the surface.

ggplot

  • The boxplot below shows the interquartile range of earned grosses for each genre of movie. For each movie, the top 100 grosses were selected and plotted. Fantasy, Action, and Sci-Fi appear to be the frontrunners while War, Horror, Biographies and History generate less revenue on average.

  • The bar plots below show the counts of movies by genre for each movie rating (or certificate). Important to notice that the majority of all movies fall under the ratings of G, PG, PG-13, and R. Horror and War have the highest counts of R rated movies, Thrillers and Action have the highest numbers of PG-13 movies, Animations have the highest number of PG movies, and Musicals have the highest number of G rated movies.

  • The histogram below shows the counts movies with specfic runtimes for each genre. The histograms suggest that Comedy and Animation are traditionally less than 100 minutes, and Sci-Fi and History movies tend to be on the longer side.

  • The boxplot below shows the interquartile range of rating score of each movie determined by the votes of viewers. The horizontal dashed line represents the mean rating score for all movies. The boxplots suggest that Drama films generally receive the highest ratings, while horror movies receive the lowest ratings.

gganimate

  • We used gganimate package to generate some graphical animations.Racing bar graph or Bar chart race has been growing in popularity since the start of this year.We tried to emulate the same visualization with the data we scrapped.The racing bar graph shows the movies produced per year and ranks highest genre of movie at the top.

  • The other animation represents bar chart race in a different way. We did it using plotly package. The animation allows the user to control the number of years on the x axis, and observe the corresponding effects of movie counts by genre.

## 
Frame 1 (1%)
Frame 2 (2%)
Frame 3 (3%)
Frame 4 (4%)
Frame 5 (5%)
Frame 6 (6%)
Frame 7 (7%)
Frame 8 (8%)
Frame 9 (9%)
Frame 10 (10%)
Frame 11 (11%)
Frame 12 (12%)
Frame 13 (13%)
Frame 14 (14%)
Frame 15 (15%)
Frame 16 (16%)
Frame 17 (17%)
Frame 18 (18%)
Frame 19 (19%)
Frame 20 (20%)
Frame 21 (21%)
Frame 22 (22%)
Frame 23 (23%)
Frame 24 (24%)
Frame 25 (25%)
Frame 26 (26%)
Frame 27 (27%)
Frame 28 (28%)
Frame 29 (29%)
Frame 30 (30%)
Frame 31 (31%)
Frame 32 (32%)
Frame 33 (33%)
Frame 34 (34%)
Frame 35 (35%)
Frame 36 (36%)
Frame 37 (37%)
Frame 38 (38%)
Frame 39 (39%)
Frame 40 (40%)
Frame 41 (41%)
Frame 42 (42%)
Frame 43 (43%)
Frame 44 (44%)
Frame 45 (45%)
Frame 46 (46%)
Frame 47 (47%)
Frame 48 (48%)
Frame 49 (49%)
Frame 50 (50%)
Frame 51 (51%)
Frame 52 (52%)
Frame 53 (53%)
Frame 54 (54%)
Frame 55 (55%)
Frame 56 (56%)
Frame 57 (57%)
Frame 58 (58%)
Frame 59 (59%)
Frame 60 (60%)
Frame 61 (61%)
Frame 62 (62%)
Frame 63 (63%)
Frame 64 (64%)
Frame 65 (65%)
Frame 66 (66%)
Frame 67 (67%)
Frame 68 (68%)
Frame 69 (69%)
Frame 70 (70%)
Frame 71 (71%)
Frame 72 (72%)
Frame 73 (73%)
Frame 74 (74%)
Frame 75 (75%)
Frame 76 (76%)
Frame 77 (77%)
Frame 78 (78%)
Frame 79 (79%)
Frame 80 (80%)
Frame 81 (81%)
Frame 82 (82%)
Frame 83 (83%)
Frame 84 (84%)
Frame 85 (85%)
Frame 86 (86%)
Frame 87 (87%)
Frame 88 (88%)
Frame 89 (89%)
Frame 90 (90%)
Frame 91 (91%)
Frame 92 (92%)
Frame 93 (93%)
Frame 94 (94%)
Frame 95 (95%)
Frame 96 (96%)
Frame 97 (97%)
Frame 98 (98%)
Frame 99 (99%)
Frame 100 (100%)
## Finalizing encoding... done!

  • The third animation is the moving curve of four randomly selected genre of movies. The animation displays the number of movies over time as an increasing/decreasing line for each genre. hrbthemes and viridis package were used for this animation.